LeoTask: a fast, flexible and reliable framework for computational research

نویسندگان

  • Changwang Zhang
  • Shi Zhou
  • Benjamin M. Chain
چکیده

Summary: LeoTask is a Java library for computation-intensive and time-consuming research tasks. It automatically executes tasks in parallel on multiple CPU cores on a computing facility. It uses a configuration file to enable automatic exploration of parameter space and flexible aggregation of results, and therefore allows researchers to focus on programming the key logic of a computing task. It also supports reliable recovery from interruptions, dynamic and cloneable networks, and integration with the plotting software Gnuplot. Availability and implementation: The source code for LeoTask is freely available under FreeBSD License at https://github.com/mleoking/leotask. Contact: [email protected] Research tasks, especially in the field of bioinformatics, are increasingly computationally intensive (1). Computational research (e.g. simulations and data analysis) typically explores results over a large parameter space and often needs to repeat a task a number of times to get an average result. Many complex computational tasks would last for days, or even weeks. As a result, they are prone to artificial (e.g. a colleague sharing the same computing facility stops your program) or natural (e.g. power outage) interruptions. To accelerate the research, it is imperative to conduct tasks in parallel, fully utilising the processing power of computing facilities (1). Nowadays, computing facilities normally have multiple cores in their Central Processing Unit (CPU), and each of the cores can individually conduct a processing task (2; 3). For example, a latest desktop computer can have 4 to 8 cores, while a computing sever can have more than 16 cores. While there are built-in mechanisms for parallel task running in all major programming languages, the level of complexity often makes these built-in mechanisms difficult to use and time consuming to program with. For example, the built-in parallel running mechanism in Java requires choosing the parts of the program that can be accessed by multiple tasks in parallel and the parts of the program that should be accessed by only one task at a time. Such choices affect not only the speed of a program but also its accuracy, i.e. a wrong choice could end up with a slow program and a program that gives wrong results. Reliability is a critical but often overlooked feature of many computational programs and frameworks. A reliable program should be able to recover and continue running from interruptions. Much effort is often needed to make a program reliable, especially if the program runs in parallel. Here we present a framework, LeoTask, to facilitate reliably conducting research tasks in parallel. It has the following combination of features which would be attractive to the research community: • Automatic & parallel parameter space exploration LeoTask uses an intuitive configuration file to specify the value ranges of task parameters. The framework will figure out all possible combinations of values for all parameters and then run tasks with different parameter value combinations in parallel, i.e. LeoTask automatically explores the parameter space. The framework maps and runs all the tasks to available processing cores of a computing facility. • Flexible & configuration-based result aggregation The configuration file also sets when and how task results are aggregated. As shown in Figure 1, LeoTask has a default task flow with 6 default time points for specifying when the results are collected. Applications using the framework can also use different task flows and define additional time points. The framework supports aggregating results conditioned on a set of parameters. For example, for a task with parameter x1, x2 and result y. Given the value range of x1 and x2, the framework can aggregate y conditioned on the value of x1, x2, value pair (x1, x2), or a any mathematical function of x1 and x2. • Programming model focusing only on the key logic to whom correspondence should be addressed

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

A New Reliable Controller Placement Model for Software-Defined WANs

Software-Defined Network (SDNs) is a decoupled architecture that enables administrators to build a customizable and manageable network. Although the decoupled control plane provides flexible management and facilitates the task of operating the network, it is the vulnerable point of failure in SDN. To achieve a reliable control plane, multiple controller are often needed so that each switch must...

متن کامل

Developing Reliable yet Flexible Software through If-Then Model Transformation Rules

Developing reliable yet flexible software is a hard problem. Although modeling methods enjoy a lot of advantages, the exclusive use of just one of them, in many cases, may not guarantee the development of reliable and flexible software. Formal modeling methods ensure reliability because they use a rigorous approach to software development. However, lack of knowledge and high cost practically fo...

متن کامل

Fast System Matrix Calculation in CT Iterative Reconstruction

Introduction: Iterative reconstruction techniques provide better image quality and have the potential for reconstructions with lower imaging dose than classical methods in computed tomography (CT). However, the computational speed is major concern for these iterative techniques. The system matrix calculation during the forward- and back projection is one of the most time- cons...

متن کامل

A FAST FUZZY-TUNED MULTI-OBJECTIVE OPTIMIZATION FOR SIZING PROBLEMS

The most recent approaches of multi-objective optimization constitute application of meta-heuristic algorithms for which, parameter tuning is still a challenge. The present work hybridizes swarm intelligence with fuzzy operators to extend crisp values of the main control parameters into especial fuzzy sets that are constructed based on a number of prescribed facts. Such parameter-less particle ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1501.01678  شماره 

صفحات  -

تاریخ انتشار 2015